{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2b00a7de-c563-4d41-b8ab-84128f0f3069",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "import requests\n",
    "from bs4 import BeautifulSoup\n",
    "from IPython.display import Markdown, display\n",
    "import os\n",
    "from dotenv import load_dotenv\n",
    "from IPython.display import Markdown, display\n",
    "from openai import OpenAI\n",
    "\n",
    "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "daa9de5c-6241-46aa-a51d-98bc154ee6e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Constants\n",
    "\n",
    "OLLAMA_API = \"http://localhost:11434/api/chat\"\n",
    "HEADERS = {\"Content-Type\": \"application/json\"}\n",
    "MODEL = \"llama3.2\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f3bf8e10-5770-4081-b099-cf83e41126b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "headers = {\n",
    " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
    "}\n",
    "\n",
    "class Website:\n",
    "\n",
    "    def __init__(self, url):\n",
    "        \"\"\"\n",
    "        Create this Website object from the given url using the BeautifulSoup library\n",
    "        \"\"\"\n",
    "        self.url = url\n",
    "        response = requests.get(url, headers=headers)\n",
    "        soup = BeautifulSoup(response.content, 'html.parser')\n",
    "        self.title = soup.title.string if soup.title else \"No title found\"\n",
    "        for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
    "            irrelevant.decompose()\n",
    "        self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e6a5d9d5-a617-4ea4-9b03-3eae2dd4520d",
   "metadata": {},
   "outputs": [],
   "source": [
    "system_prompt = \"You are an assistant that analyzes the contents of a website \\\n",
    "and provides a short summary, ignoring text that might be navigation related. \\\n",
    "Respond in markdown.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c01a0e24-ccf7-4359-a731-dcda6bfc5023",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def user_prompt_for(website):\n",
    "    user_prompt = f\"You are looking at a website titled {website.title}\"\n",
    "    user_prompt += \"\\nThe contents of this website is as follows; \\\n",
    "please provide a short summary of this website in markdown. \\\n",
    "If it includes news or announcements, then summarize these too.\\n\\n\"\n",
    "    user_prompt += website.text\n",
    "    return user_prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "43f5df54-a34b-42cd-a6b2-e28996a84ff7",
   "metadata": {},
   "outputs": [],
   "source": [
    "def messages_for(website):\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
    "        {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b79e63fb-b741-4f4a-8bc4-66a60feef2cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "def summarize(url):\n",
    "    website = Website(url)\n",
    "    # response = openai.chat.completions.create(\n",
    "    #     model = \"gpt-4o-mini\",\n",
    "    #     messages = messages_for(website)\n",
    "    # )\n",
    "    response = ollama_via_openai.chat.completions.create(\n",
    "        model=\"deepseek-r1:1.5b\",\n",
    "        messages=messages_for(website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "3298a858-e5de-4804-b188-06c0ce6471b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def display_summary(url):\n",
    "    summary = summarize(url)\n",
    "    display(Markdown(summary))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "40dcb721-f807-47bf-9d18-f6a649c371e0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<think>\n",
       "Alright, so I'm trying to figure out why CNN's \"World Today\" section is showing the word \"Gulf.\" I remember that \"Gulf\" was a significant event, but not sure if there have been any other notable events in the Gulf region around recent times. Is it about oil production or something else? Maybe natural gas?\n",
       "Wait, I've heard of an oil-producing company called BP, which is associated with the Gulf. They're big on Shell and Opec members like Russia. But does that make \"Gulf\" part of their content? Or is it more about how the Gulf looks in visual terms?\n",
       "\n",
       "I'm also thinking about news categories—global events, geopolitical stuff, tech, culture, etc.—and maybe a recent oil production related report. Maybe they show real-time data about how much oil has been produced or the availability in some country nearby.\n",
       "\n",
       "Hmm, I'm not sure if \"Gulf\" refers to Earth's location or just part of the Gulf region because they have two Gulfagoras islands as landmarks that sound similar to \"Gulf.\" Could it be a typo where the team named these after BP? But then if it were Gulfagoras, wouldn't they be more about geography than the actual oil aspect? Or maybe they were the names when BP was discovered?\n",
       "\n",
       "I think CNN is aiming for current news, so they probably show how something or another company has done in the Gulf. Since BP is big there, with data like gas production and costs, that might fit under geopolitical or energy news. But why specifically \"Gulf\"? Maybe to align with how some other teams handle Earth-related news.\n",
       "\n",
       "Overall, I'm leaning towards it being about BP's recent activities due to their oil involvement in the Gulf region. They typically cover geological products, energy, and production reports, so \"Gulf\" probably refers to those specific topics or regions.\n",
       "</think>\n",
       "\n",
       " CNN's \"World Today\" section shows \"Gulf,\" likely referring to BP's exploration of Earth resources, particularly in relation to their oil production in the Gulf region. BP is well-known for being part of the Gulf region and associated with major companies like Shell, Opec members such as Russia, and significant geological features like the Gulfagoras islands, which might be named after them due to BP's location or discovery nearby. Therefore, this title reflects their current geopolitical news focusing on energy-related activities in the Gulf.\n",
       "\n",
       "**Answer:** The \"Gulf\" section likely refers to the oil production activities of BP, linking them to the Gulf region and geologically significant features in Earth terms."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display_summary(\"https://cnn.com\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f41a586e-fe1f-4040-8ebb-31887981907f",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
